[poster] Expanding Hubverse Evaluation Metrics and Dashboard Support#34
[poster] Expanding Hubverse Evaluation Metrics and Dashboard Support#34
Conversation
Adds the project poster for expanding hubverse evaluation metrics and dashboard support (five mini-sprints: UI polish, config-driven enhancements, scale transforms, variogram score, documentation). Also corrects the README poster instructions to use `project-posters/` instead of `posters/`, matching the actual convention in the repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
seabbs
left a comment
There was a problem hiding this comment.
This looks good to me. variogram has landed on scoringutils main already and there should be a CRAN release this week. We expect some small changes to i.e documentation to make it easier to use but nothing breaking.
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
| **hubPredEvalsData changes:** | ||
| - Add `transform_defaults` (top-level) and per-target `transform` to `inst/schema/v1.1.0/config_schema.json` | ||
| - Allowed transform functions: `log_shift`, `sqrt`, `log1p`, `log`, `log10`, `log2` | ||
| - `append: true/false` — when true, scores.csv gains a `scale` column (`"natural"` or transform label) |
There was a problem hiding this comment.
Given you want a wide table, It might be good to think through whether hubPredEvalsData should also output a wide table? This would make the table easier to present, not sure about the evals visualisation? Would be great to hear @matthewcornell 's thoughts on this
There was a problem hiding this comment.
Sorry, I don't understand enough to answer yet. Is there a summary of the specific changes you're asking about? I haven't touched any of the score-loading/manipulating code in predevals at this point.
There was a problem hiding this comment.
I think we want to treat scores on a transformed scale as a separate score consistently throughout. This means, I think, treating it as a separate column, which will mean new columns in the tables and new variable names in menu selectors for the plots. So maybe this needs to be updated so that the scores.csv would return new columns for the transformed scores, not a new column for scale?
There was a problem hiding this comment.
@annakrystalli are you saying that hubPredEvalsData currently outputs a long table but that we might want to change it to output a wide table given the requirements that we want the eventual table to be displayed in wide format?
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Thanks for putting this together @nickreich , having a single overview of the full project scope across all the workstreams is really valuable.
I have some high-level structural comments in addition to my inline comments.
1. Remove AGENTS.md
We use Claude Code interactively rather than autonomous agents, we just point it at the relevant document for context. This file duplicates content from the poster (pipeline architecture, sprint structure, design decisions, open questions, key files), creating two documents to review and keep in sync. The poster itself is sufficient.
2. Development standards section doesn't belong here
The "Development standards" section (issue refinement format, TDD workflow, universal DoD) is prescribing team-wide methodology, that's a separate discussion, not something to embed in a project-specific poster. If we agree on a standard workflow, it should live in a team-level contributing guide (or even a Claude Code skill) where it's discoverable and reusable. As it stands, it's also something we haven't discussed as a team.
3. Level of detail
The poster has a lot of implementation-level detail (per-issue DoD checklists, file-level change specs, validation behaviour specifics) that goes beyond what's expected in a high-level project poster. A poster should capture the problem space, workstream overview, sequencing, and risks, the implementation detail belongs in issues in the relevant repos.
4. Consistency with existing planning work
Some of the Sprint C planning was already done in detail in hubPredEvalsData#34, and restating it here has introduced some inconsistencies:
- The Sprint C DoD says
transform: nullfor opt-out but the issue's schema usestransform: false, these have different semantics. - The poster says config applying a transform to a pmf target "fails validation," but the issue specifies a two-tier approach (error if explicitly set, warn if inherited from defaults).
The Sprint D section also introduces joint_across as a new config property name but the underlying hubEvals parameter is compound_taskid_set, which is also what we use in tasks.json. These are actually opposite concepts, so I'd suggest using compound_taskid_set consistently to avoid confusion.
Sprint C should link directly to hubPredEvalsData#34.
The README fix looks good.
|
This is super helpful, @nickreich . Overall I agree with @annakrystalli 's points. WRT the impact on the UI component and other areas I've been contributing to, I think I'd need to sit down and go over some concrete examples for me to understand the changes. As I said above, I haven't worked with the scoring data in the UI, just interface stuff (if that makes sense). Re: "predevals JS changes":
Are we saying that these are a kind of virtual/dynamic score that has to be added at UI time, rather than being generated as separate columns. If so, this would make me nervous. It would help me if we could review the changes I'll be responsible for together in detail so I can understand the implications before we move too far along. |
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
|
From @matthewcornell
No. All scores will be computed and generated as separate columns beforehand. No UI computation. |
- Remove AGENTS.md (duplicated poster content); migrate pipeline diagram and repo links into the poster's What do we already know section - Remove Development Standards section (team methodology, not project scope) - Trim per-sprint DoD checklists to brief acceptance criteria - Slim down Sprint C to reference hubPredEvalsData#34 as the authoritative implementation plan, eliminating duplicated/conflicting detail - Fix transform: null to transform: false (matching hubPredEvalsData#34) - Replace joint_across with compound_taskid_set throughout (matching established hubverse terminology) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Per a brief conversation today w/@nickreich , here's the plan we came up with when we made our last hubPredEvalsData schema change (adding the |
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
matthewcornell
left a comment
There was a problem hiding this comment.
I think it looks good. Thanks, Nick.
project-posters/eval-metrics-expansion/eval-metrics-expansion.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Nicholas G Reich <nick@umass.edu>
Summary
This PR adds the project poster for expanding the hubverse forecast evaluation ecosystem and fixes a README inconsistency.
Project poster (
project-posters/eval-metrics-expansion/)AGENTS.mdfor AI agent / contributor onboarding contextREADME fix
Corrects the poster creation instructions to use
project-posters/<project>/instead ofposters/<project>/, matching the actual convention used by all existing posters in the repo.Review timeline
1 week suggested.
🤖 Generated with Claude Code